欢迎关注“小丫画图”公众号,同名知识星球等你加入

小丫微信: epigenomics E-mail: figureya@126.com

作者:Haitao Wang

Dr. Haitao Wang: JNU,Guagnzhou => GIBH-CAS,Guagnzhou => BNU,Beijing => IMCB,Singapore => UMAC,Macao => NCCS,Singapore

Email: ht.wang@yahoo.com; wang.haitao@nccs.com.sg

小丫注释、校验

需求描述

我要DIY出paper里的这种对比多个subtype的composite copy number profiles:

出自https://www.nature.com/articles/nature20805

GISTIC自带画图功能(见结尾“附二”),能画出类似于C这种图。从firehose能下载到TCGA各癌症的Segmented copy number profiles(类似于B)和Genomic positions of amplified regions(类似于C),但只有全部sample的图,我要对比不同subtype:

出自https://www.sciencedirect.com/science/article/pii/S2352396418302986?via%3Dihub

应用场景

此处提供的DIY画图方法更灵活,可以画gistic score,还可以画percentage/frequency。

可单独画一个sample,也可以对比多组subtypes。

环境设置

使用国内镜像安装包

options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
#其他物种到这里找包的名字http://bioconductor.org/packages/3.8/data/annotation/

加载包

library(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind,
##     colMeans, colnames, colSums, dirname, do.call, duplicated,
##     eval, evalq, Filter, Find, get, grep, grepl, intersect,
##     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
##     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
##     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which, which.max,
##     which.min
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 3.5.2
## Loading required package: GenomicRanges
## Loading required package: Biostrings
## Warning: package 'Biostrings' was built under R version 3.5.2
## Loading required package: XVector
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
## Loading required package: rtracklayer
## Warning: package 'rtracklayer' was built under R version 3.5.2
Sys.setenv(LANGUAGE = "en") #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor

输入文件的获得

画copy number profile需要gistic score和染色体信息,其中gistic score可以用GISTIC 2.0计算(输入segment file)。

如果你已经算出gistic scores,保存为“easy_input_*scores.gistic.txt”,就可以跳过这步,直接进入“准备染色体信息”。

获得subtype的gistic scores

下载全部样本的segment file,然后按subtype分开,用GISTIC 2.0计算gistic scores,然后用算出的gistic scores来画图。

第一步,下载全部样本的segment file

focal_input.seg.txt:segment file。从firehose下载hg19版本的TCGA 数据,例文用 ESCA,压缩包链接:http://gdac.broadinstitute.org/runs/analyses__2016_01_28/data/ESCA/20160128/gdac.broadinstitute.org_ESCA-TP.CopyNumber_Gistic2.Level_4.2016012800.0.0.tar.gz。解压缩,点击nozzle.html,查看其他结果图表和Methods。需要里面的focal_input.seg.txt和scores.gistic(后面直接用它画全部样本的图)两个文件。

hg38版本(TCGA数据)segment的获取方法见“附一”。

### segment information
tumor.seg.cnv <- read.table("focal_input.seg.txt", sep="\t", header=T, stringsAsFactors=F)
#tail(tumor.seg.cnv)
tumor.seg.cnv$Sample <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", tumor.seg.cnv$Sample)
#tail(tumor.seg.cnv)

第二步,拆分出subtype的segment file

从TCGA获得亚型的sample ID,然后把focal_input.seg.txt拆分成亚型的segment file

此处模仿例文,根据Histological.Type…Oesophagus和Gastric.classification分为四种subtype。实际操作时可根据每种癌症的具体情况选择合适的列来分亚型做对比。

# Get subtypes information
library(TCGAbiolinks)
## Warning: package 'TCGAbiolinks' was built under R version 3.5.2
dataSub <- data.frame(TCGAquery_subtype(tumor = "ESCA"))
## esca subtype information from:doi:10.1038/nature20805
# dataSub跟例文的Supplementary Table 1是同一个文件
table(dataSub$Histological.Type...Oesophagus)
## 
##  EAC ESCC 
##   72   90
table(dataSub$Tissue.level.Location)
## 
##     Gastric Oesophageal 
##         359         164
EAC_id <- dataSub[dataSub$Histological.Type...Oesophagus=="EAC", "patient"]
ESCC_id <- dataSub[dataSub$Histological.Type...Oesophagus=="ESCC", "patient"]
GA <- dataSub[dataSub$Tissue.level.Location=="Gastric", ]
table(GA$Gastric.classification)
## 
## CIN EBV  GS MSI 
## 188  30  66  75
GA_CIN_id <- dataSub[dataSub$Gastric.classification=="CIN", "patient"]
GA_nonCIN_id <- dataSub[dataSub$Gastric.classification!="CIN", "patient"]

# Get subtype segments information
EAC.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% EAC_id,]
ESCC.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% ESCC_id,]
GA_CIN.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% GA_CIN_id,]
GA_nonCIN.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% GA_nonCIN_id,]

write.table(EAC.seg.cnv, file="tumor.EAC.seg.txt", sep="\t", row.names=F, quote = F)
write.table(ESCC.seg.cnv, file="tumor.ESCC.seg.txt", sep="\t", row.names=F, quote = F)
write.table(GA_CIN.seg.cnv, file="tumor.GACIN.seg.txt", sep="\t", row.names=F, quote = F)
write.table(GA_nonCIN.seg.cnv, file="tumor.GAnonCIN.seg.txt", sep="\t", row.names=F, quote = F)

第三步,计算gistic score

注:segment file作为GISTIC 2.0的输入,用GISTIC 2.0计算gistic score的方法见“附二”。

计算将获得easy_input_*.scores.gistic.txt文件:

  • easy_input_scores.gistic.txt:所有样品的gistic score,跟focal_input.seg.txt位于同一压缩包,原文件名为scores.gistic。
  • easy_input_*.scores.gistic.txt:四种亚型的gistic score。

准备染色体信息

# Create a chromosomes reference objects function
chrom_extract <- function(BSgenome.hg  = NULL) {
  if (is.null(BSgenome.hg )) stop("NULL object !", call. = FALSE)
  obj <- list(species = GenomeInfoDb::organism(BSgenome.hg), genomebuild = BSgenome::providerVersion(BSgenome.hg))
  df <- data.frame(chrom = BSgenome::seqnames(BSgenome.hg), chrN = seq_along(BSgenome::seqnames(BSgenome.hg)), chr.length = GenomeInfoDb::seqlengths(BSgenome.hg), stringsAsFactors = FALSE)
  df <- df[1:24,]
  df$chr.length.sum <- cumsum(as.numeric(df$chr.length))
  df$chr.length.cumsum <- c(0, df$chr.length.sum[-nrow(df)])
  df$middle.chr <- round(diff(c(0, df$chr.length.sum)) /2)
  df$middle.chr.genome <- df$middle.chr + df$chr.length.cumsum
  obj$chromosomes <- df
  obj$chrom2chr <- sapply(obj$chromosomes$chrom, function(k) { obj$chromosomes$chrN[obj$chromosomes$chrom == k]}, simplify = FALSE)
  obj$chr2chrom <- sapply(obj$chromosomes$chrN, function(k) { obj$chromosomes$chrom[obj$chromosomes$chrN == k]}, simplify = FALSE)
  names(obj$chr2chrom) <- obj$chromosomes$chrN
  obj$genome.length <- sum(as.numeric(obj$chromosomes$chr.length), na.rm = TRUE)
  return(obj)
}

# Extract a chromosomes reference loci
BSgenome.hg = "BSgenome.Hsapiens.UCSC.hg19"
BSg.obj <- getExportedValue(BSgenome.hg, BSgenome.hg)
genome.version <- BSgenome::providerVersion(BSg.obj)
chrom <- chrom_extract(BSg.obj)
#str(chrom)

开始画图

分别画全部样本的gistic score和percentage/frequency,再画四个subtye对比的gistic score和percentage/frequency。

画全部sample的gistic score

#pdf("ESCA_copy_number_gistic_score.pdf",12,5)
# Import gistic2 results read gistic output file
scores <- read.table("easy_input_scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
head(scores)
##   Type Chromosome   Start     End X.log10.q.value.  G.score
## 1  Amp          1 3218610 3229258                0 0.134372
## 2  Amp          1 3235468 3279324                0 0.141093
## 3  Amp          1 3280814 3522124                0 0.128763
## 4  Amp          1 3535644 3565825                0 0.136513
## 5  Amp          1 3575833 3614480                0 0.128763
## 6  Amp          1 3614576 3654595                0 0.136279
##   average.amplitude frequency
## 1          0.419716  0.206522
## 2          0.422953  0.211957
## 3          0.419967  0.201087
## 4          0.433913  0.201087
## 5          0.417425  0.201087
## 6          0.432290  0.201087
unique(scores$Chromosome)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#把染色体名从阿拉伯数字改为“chr1”、“chrX”的形式
scores[scores$Chromosome==23, "Chromosome"] <- "X"
scores[scores$Chromosome==24, "Chromosome"] <- "Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))

# Important step for accurate length to match back to continual chrom loci
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score) - 0.1, max(scores$G.score) + 0.1)
title <- paste0("TCGA ESCA overall copy number gistic score", " ", "n=", length(dataSub$patient))

plot(scores.amp$Start.geno, scores.amp$G.score,
     pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", 
     xlim = c(0,chrom$genome.length), ylim = ylim,
     main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
     cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.15,ylim[2]-0.15)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))

#dev.off()

画全部样本的percentage/frequency

#pdf("ESCA_copy_number_percentage.pdf",12,15)
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency<- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency<- scores.del$frequency * -100

# copy number percentage plot
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim<- c(-90,90)
title=paste0("ESCA overall copy number percentage"," ","n=",length(dataSub$patient))

plot(scores.amp$Start.geno, scores.amp$frequency,
     pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", 
     xlim = c(0,chrom$genome.length), ylim = ylim,
     main = title, cex.main = 2, ylab = "gain/loss percentage in cohort", xlab = NA,
     cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .5))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(-80,80)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))

画四个subtype对比的gistic score

pdf("cnv.scores.gistic.pdf",15,12)
par(mfrow=c(4,1), mar = par()$mar + c(3,0,0,3))


### ESCC ###
scores <- read.table("easy_input_ESCC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("ESCC copy number gistic score"," ","n=",length(ESCC_id))

plot(scores.amp$Start.geno, scores.amp$G.score,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### EAC ###
scores <- read.table("easy_input_EAC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -100
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("EAC copy number gistic score"," ","n=",length(EAC_id))

plot(scores.amp$Start.geno, scores.amp$G.score,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### GA_CIN ###
scores <- read.table("easy_input_GA_CIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("GA_CIN copy number gistic score"," ","n=",length(GA_CIN_id))

plot(scores.amp$Start.geno, scores.amp$G.score,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### GA_nonCIN ###
scores <- read.table("easy_input_GA_nonCIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("GA_nonCIN copy number gistic score"," ","n=",length(GA_nonCIN_id))

plot(scores.amp$Start.geno, scores.amp$G.score,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
dev.off()
## quartz_off_screen 
##                 2

四种subtype的percentage/frequence

pdf("cnv.frequence.pdf",15,12)
par(mfrow=c(4,1), mar = par()$mar + c(3,0,0,3))

### ESCC ###
scores <- read.table("easy_input_ESCC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("ESCC, n=",length(ESCC_id))

plot(scores.amp$Start.geno, scores.amp$frequency,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### EAC ###
scores <- read.table("easy_input_EAC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("EAC, n=",length(EAC_id))

plot(scores.amp$Start.geno, scores.amp$frequency,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### GA_CIN ###
scores <- read.table("easy_input_GA_CIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("GA_CIN, n=",length(GA_CIN_id))

plot(scores.amp$Start.geno, scores.amp$frequency,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))


### GA_nonCIN ###
scores <- read.table("easy_input_GA_nonCIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)

# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]

# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)

# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("GA_nonCIN, n=",length(GA_nonCIN_id))

plot(scores.amp$Start.geno, scores.amp$frequency,
                 pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
                 main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
                 cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)

col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
dev.off()
## quartz_off_screen 
##                 2

附二:GISTIC 2.0用法

To generate discrete copy number data file you may need to run GISTIC 2.0. GISTIC 2.0 can be [installed]http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=216&p=t or run online using the GISTIC 2.0 module on [GenePattern]https://cloud.genepattern.org/. Running GISTIC 2.0 requires two input files:

出自:https://cbioportal.readthedocs.io/en/latest/Data-Loading-Tips-and-Best-Practices.html

GISTIC2.0在线版根据说明点鼠标即可

In linux system run GISTIC2

source /etc/profile
source /etc/profile.d/modules.sh

module add impi/5.1.3
module add intel/16.0.3
module add java/1.8.0_91

#mpirun -np 48 /share/apps/vasp/5.4.1/intel/16.0.2/bin/vasp_std

echo --- creating output directory --- 
basedir=`pwd`/esca.basal.seg #esca.luma.seg 
mkdir -p $basedir
echo --- running GISTIC ---

## input file definitions

segfile=`pwd`/esca.basal.seg.cnv.txt #esca.luma.seg.cnv.txt
refgenefile=`pwd`/refgenefiles/hg19.UCSC.add_miR.140312.refgene.mat

## call script that sets MCR environment and calls GISTIC executable

./gistic2 -b $basedir -seg $segfile -refgene $refgenefile -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.95 -armpeel 1 -savegene 1 -gcm extreme
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14.3
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] TCGAbiolinks_2.10.4               BSgenome.Hsapiens.UCSC.hg19_1.4.0
##  [3] BSgenome_1.50.0                   rtracklayer_1.42.2               
##  [5] Biostrings_2.50.2                 XVector_0.22.0                   
##  [7] GenomicRanges_1.34.0              GenomeInfoDb_1.18.2              
##  [9] IRanges_2.16.0                    S4Vectors_0.20.1                 
## [11] BiocGenerics_0.28.0              
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.3               circlize_0.4.5               
##   [3] AnnotationHub_2.14.4          aroma.light_3.12.0           
##   [5] plyr_1.8.4                    selectr_0.4-1                
##   [7] ConsensusClusterPlus_1.46.0   lazyeval_0.2.1               
##   [9] splines_3.5.1                 BiocParallel_1.16.6          
##  [11] ggplot2_3.1.0                 sva_3.30.1                   
##  [13] digest_0.6.18                 foreach_1.4.4                
##  [15] htmltools_0.3.6               magrittr_1.5                 
##  [17] memoise_1.1.0                 cluster_2.0.7-1              
##  [19] doParallel_1.0.14             limma_3.38.3                 
##  [21] ComplexHeatmap_1.20.0         readr_1.3.1                  
##  [23] annotate_1.60.1               matrixStats_0.54.0           
##  [25] sesameData_1.0.0              R.utils_2.8.0                
##  [27] prettyunits_1.0.2             colorspace_1.4-0             
##  [29] blob_1.1.1                    rvest_0.3.2                  
##  [31] ggrepel_0.8.0                 xfun_0.5                     
##  [33] dplyr_0.8.0.1                 crayon_1.3.4                 
##  [35] RCurl_1.95-4.12               jsonlite_1.6                 
##  [37] genefilter_1.64.0             zoo_1.8-4                    
##  [39] survival_2.43-3               iterators_1.0.10             
##  [41] glue_1.3.0                    survminer_0.4.3              
##  [43] gtable_0.2.0                  sesame_1.0.0                 
##  [45] zlibbioc_1.28.0               GetoptLong_0.1.7             
##  [47] DelayedArray_0.8.0            wheatmap_0.1.0               
##  [49] shape_1.4.4                   scales_1.0.0                 
##  [51] DESeq_1.34.1                  DBI_1.0.0                    
##  [53] edgeR_3.24.3                  ggthemes_4.1.0               
##  [55] Rcpp_1.0.0                    cmprsk_2.2-7                 
##  [57] xtable_1.8-3                  progress_1.2.0               
##  [59] bit_1.1-14                    matlab_1.0.2                 
##  [61] km.ci_0.5-2                   preprocessCore_1.44.0        
##  [63] httr_1.4.0                    RColorBrewer_1.1-2           
##  [65] pkgconfig_2.0.2               XML_3.98-1.19                
##  [67] R.methodsS3_1.7.1             locfit_1.5-9.1               
##  [69] DNAcopy_1.56.0                tidyselect_0.2.5             
##  [71] rlang_0.3.1                   later_0.8.0                  
##  [73] AnnotationDbi_1.44.0          munsell_0.5.0                
##  [75] tools_3.5.1                   downloader_0.4               
##  [77] generics_0.0.2                RSQLite_2.1.1                
##  [79] ExperimentHub_1.8.0           broom_0.5.1                  
##  [81] evaluate_0.13                 stringr_1.4.0                
##  [83] yaml_2.2.0                    knitr_1.22                   
##  [85] bit64_0.9-7                   survMisc_0.5.5               
##  [87] purrr_0.3.1                   randomForest_4.6-14          
##  [89] EDASeq_2.16.3                 nlme_3.1-137                 
##  [91] mime_0.6                      R.oo_1.22.0                  
##  [93] xml2_1.2.0                    biomaRt_2.38.0               
##  [95] compiler_3.5.1                curl_3.3                     
##  [97] interactiveDisplayBase_1.20.0 tibble_2.0.1                 
##  [99] geneplotter_1.60.0            stringi_1.3.1                
## [101] GenomicFeatures_1.34.4        lattice_0.20-38              
## [103] Matrix_1.2-15                 KMsurv_0.1-5                 
## [105] pillar_1.3.1                  BiocManager_1.30.4           
## [107] GlobalOptions_0.1.0           data.table_1.12.0            
## [109] bitops_1.0-6                  httpuv_1.4.5.1               
## [111] R6_2.4.0                      latticeExtra_0.6-28          
## [113] hwriter_1.3.2                 promises_1.0.1               
## [115] ShortRead_1.40.0              gridExtra_2.3                
## [117] codetools_0.2-16              assertthat_0.2.0             
## [119] SummarizedExperiment_1.12.0   rjson_0.2.20                 
## [121] GenomicAlignments_1.18.1      Rsamtools_1.34.1             
## [123] GenomeInfoDbData_1.2.0        mgcv_1.8-27                  
## [125] hms_0.4.2                     grid_3.5.1                   
## [127] tidyr_0.8.3                   rmarkdown_1.11               
## [129] ggpubr_0.2                    Biobase_2.42.0               
## [131] shiny_1.2.0